2 research outputs found
Bayesian spatial modeling of malnutrition and mortality among under-five children in sub-Saharan Africa.
Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.The aim of this thesis is to develop and extend Bayesian statistical models in the
area of spatial modeling and apply them to child health outcomes, with particular
focus on childhood malnutrition and mortality among under-five children. The easy
availability of a geo-referenced database has stimulated a paradigm shift in methodological
approaches to spatial analysis. This study reviewed the spatial methods
and disease mapping models developed for areal (lattice) data analysis. Observational
data collected from complex design surveys and geographical locations often
violates the independent assumption of classical regression models. By relaxing the
restrictive linearity and normality assumptions of classical regression models, this
study first developed a flexible semi-parametric spatial model that accommodates
the usual fixed effect, nonlinear and geographical component in a unified model.
The approach was explored in the analysis of spatial patterns of child birth outcomes
in Nigeria. The study also addressed the issue of disease clustering, which
is of interest to epidemiologists and public health officials. The study then proposed
a Bayesian hierarchical analysis approach for Poisson count data and formulated
a Poisson version of generalized linear mixed models (GLMMs) for analyzing
childhood mortality. The model simultaneously addressed the problem of overdispersion
and spatial dependence by the inclusion of the risk factors and random
effects in a single model. The proposed approach identified regions with elevated
relative risk or clustering of high mortality and evaluated the small scale geographical
disparities in sub-populations across the regions. The study identified another
challenge in spatial data analysis, which are spatial autocorrelation and model misspecification.
The study then fitted geoadditive mixed (GAM) models to analyze
childhood anaemia data belonging to a family of exponential distributions (Gaussian,
binary and multinomial). The GAM models are extension of generalized linear
mixed models by allowing the inclusion of splines for continuous covariate (or time)
trends with the parametric function. Lastly, the shared component model originally
developed for multiple disease mapping was reviewed and modified to suit the binary
data at hand. A multivariate conditional autoregressive (MCAR) model was
developed and applied to jointly analyze three child malnutrition indicators. The
approach facilitated the estimation of conditional correlation between the diseases;
assess the spatial association with the regions and geographical variation of individual
disease prevalence. The spatial analysis presented in this thesis is useful to
inform health-care policy and resource allocation. This thesis contributes to methodological
applications in life sciences, environmental sciences, public health and agriculture.
The present study expands the existing methods and tools for health impact
assessment in public health studies.
KEYWORDS: Conditional Autoregressive (CAR) model, Disease Mapping Models,
Multiple Disease mapping, Health Geography, Ecology Models, Spatial Epidemiology,
Childhood Health outcomes
Empirical statistical modelling for crop yields predictions: bayesian and uncertainty approaches
Includes bibliographical referencesThis thesis explores uncertainty statistics to model agricultural crop yields, in a situation where there are neither sampling observations nor historical record. The Bayesian approach to a linear regression model is useful for predict ion of crop yield when there are quantity data issue s and the model structure uncertainty and the regression model involves a large number of explanatory variables. Data quantity issues might occur when a farmer is cultivating a new crop variety, moving to a new farming location or when introducing a new farming technology, where the situation may warrant a change in the current farming practice. The first part of this thesis involved the collection of data from experts' domain and the elicitation of the probability distributions. Uncertainty statistics, the foundation of uncertainty theory and the data gathering procedures were discussed in detail. We proposed an estimation procedure for the estimation of uncertainty distributions. The procedure was then implemented on agricultural data to fit some uncertainty distributions to five cereal crop yields. A Delphi method was introduced and used to fit uncertainty distributions for multiple experts' data of sesame seed yield. The thesis defined an uncertainty distance and derived a distance for a difference between two uncertainty distributions. We lastly estimated the distance between a hypothesized distribution and an uncertainty normal distribution. Although, the applicability of uncertainty statistics is limited to one sample model, the approach provides a fast approach to establish a standard for process parameters. Where no sampling observation exists or it is very expensive to acquire, the approach provides an opportunity to engage experts and come up with a model for guiding decision making. In the second part, we fitted a full dataset obtained from an agricultural survey of small-scale farmers to a linear regression model using direct Markov Chain Monte Carlo (MCMC), Bayesian estimation (with uniform prior) and maximum likelihood estimation (MLE) method. The results obtained from the three procedures yielded similar mean estimates, but the credible intervals were found to be narrower in Bayesian estimates than confidence intervals in MLE method. The predictive outcome of the estimated model was then assessed using simulated data for a set of covariates. Furthermore, the dataset was then randomly split into two data sets. The informative prior was later estimated from one-half called the "old data" using Ordinary Least Squares (OLS) method. Three models were then fitted onto the second half called the "new data": General Linear Model (GLM) (M1), Bayesian model with a non-informative prior (M2) and Bayesian model with informative prior (M3). A leave-one-outcross validation (LOOCV) method was used to compare the predictive performance of these models. It was found that the Bayesian models showed better predictive performance than M1. M3 (with a prior) had moderate average Cross Validation (CV) error and Cross Validation (CV) standard error. GLM performed worst with least average CV error and highest (CV) standard error among the models. In Model M3 (expert prior), the predictor variables were found to be significant at 95% credible intervals. In contrast, most variables were not significant under models M1 and M2. Also, The model with informative prior had narrower credible intervals compared to the non-information prior and GLM model. The results indicated that variability and uncertainty in the data was reasonably reduced due to the incorporation of expert prior / information prior. We lastly investigated the residual plots of these models to assess their prediction performance. Bayesian Model Average (BMA) was later introduced to address the issue of model structure uncertainty of a single model. BMA allows the computation of weighted average over possible model combinations of predictors. An approximate AIC weight was then proposed for model selection instead of frequentist alternative hypothesis testing (or models comparison in a set of competing candidate models). The method is flexible and easy to interpret instead of raw AIC or Bayesian information criterion (BIC), which approximates the Bayes factor. Zellner's g-prior was considered appropriate as it has widely been used in linear models. It preserves the correlation structure among predictors in its prior covariance. The method also yields closed-form marginal likelihoods which lead to huge computational savings by avoiding sampling in the parameter space as in BMA. We lastly determined a single optimal model from all possible combination of models and also computed the log-likelihood of each model